Why Have Hmms Been so Successful for Automatic Speech Recognition and How Might They Be Improved?

نویسندگان

  • Wendy HOLMES
  • Mark HUCKVALE
چکیده

Most of the current successful systems for automatic speech recognition are based on hidden Markov models (HMMs). HMMs are basically general-purpose statistical pattern matchers, but have so far proved more successful than approaches which have been based on specific knowledge about speech. This paper discusses the likely reasons for this success. It is argued that, in addition to providing a tractable mathematical framework with straightforward algorithms for training and recognition, HMMs have a general structure which is broadly appropriate for speech: both shortterm spectral variability and temporal variability can be modelled. This general structure can be tailored to known characteristics of the sounds being modelled, but the actual parameters are optimized based on training data. Another very important characteristic of HMMs is that they provide a complete model which can also be employed at higher levels (such as syntax), with only a single decision being made based on finding the best match at all levels. All these aspects combine to make HMMs a powerful approach to speech modelling. It is therefore argued that, in order to progress recognition capabilities further, the best approach is to retain all these advantages of the general framework but to overcome the limitations of the current HMM formalism by making it segment-based rather than frame-based.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models

Hidden Markov models (HMMs) have been successful for modelling the dynamics of carefully dictated speech, but their performance degrades severely when used to model conversational speech. Since speech is produced by a system of loosely coupled articulators, stochastic models explicitly representing this parallelism may have advantages for automatic speech recognition (ASR), particularly when tr...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Why the Critics of Poor Health Service Delivery Are the Causes of Poor Service Delivery: A Need to Train the Policy-makers; Comment on “Why and How Is Compassion Necessary to Provide Good Quality Healthcare?”

This comment on Professor Fotaki’s Editorial agrees with her arguments that training health professionals in more compassionate, caring and ethically sound care will have little value unless the system in which they work changes. It argues that for system change to occur, senior management, government members and civil servants themselves need training so that they learn to understand the effec...

متن کامل

Loosely coupled HMMs for ASR

Hidden Markov Models (HMMs) have been successful for modelling the dynamics of carefully dictated speech, but their performance degrades severely when used to model conversational speech. This paper presents a preliminary feasibility study of an alternative class of models: loosely coupled HMMs. Since speech is produced by a system of loosely coupled articulators, stochastic models explicitly r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002